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Abstract 

This contribution is devoted to the comparison of various resampling approaches that have been 
proposed in the literature on particle filtering. It is first shown using simple arguments that the so- 
called residual and stratified methods do yield an improvement over the basic multinomial resampling 
approach. A simple counter-example showing that this property does not hold true for systematic 
resampling is given. Finally, some results on the large-sample behavior of the simple bootstrap filter 
algorithm are given. In particular, a central limit theorem is established for the case where resampling 
is performed using the residual approach. 

1 Introduction 

The terms particle filtering or Sequential Monte Carlo (henceforth abbreviated to SMC), refer to a class 
of techniques which have demonstrated a strong potential for signal and image processing applications 
0, |17j . Schematically, the principle behind sequential Monte Carlo may be viewed as the combination 
of two main elements: sequential importance sampling, which dates back to |lbl I12j. and resampling, 
whose importance in the context of SMC was first demonstrated by based on ideas of I n this 
contribution, we focus on the second aspect and consider the comparison of several techniques that have 
been proposed to implement the resampling step. 

To fix the notations, we briefly describe the basic SMC approach known as sequential importance 
sampling with resampling (or SISR). The algorithm proceeds as follows: 

• At time 0, draw m particles {£o}i<j<m from a common probability density ro and compute the 
associated importance weights Wq = ^o(£o)<7o(£o)/ r o(£o) - 

• For successive time indices and for i = 1, . . . , m, simulate Ck+i independently from the past accord- 
ing to a transition density function 1 r(^, •) and update the weights as 

4+i = <4<k4, a+i w(a+i) Ma, a+o- 

In the context of filtering, i^o us the initial distribution of the state variable, q is the transition density func- 
tion corresponding to the, possibly non-linear, state equation (supposed here to be time-homogeneous), 

this contribution it is assumed that all transition kernels K(x, dy) may be written as k(x, y)X(dy), where A is a fixed 
reference measure (which we usually do not specify); k is referred to as a transition density function. When v is a probability 
density function and / a function, we will use the usual notations v(f) = J u(x)f(x)X(dx), kf(x) = J k(x,x')f(x')X(dx'), 
vk(x) = J u(x')X(dx')k(x' , x), and, 

vkf = J u(x)kf(x)X(dx) = J uk(x)f(x)X(dx) 

= JJ v(x)k(x,x')f{x')X(dx)X(dx'). 



and gk is the conditional likelihood of the observation at index k given the corresponding state, viewed 
as a function of the state variable. Then, the self-normalized importance sampling estimator 



rn m 



i=i j=i 



is an estimator of the filtered state moment, that is the expectation of / applied to the non-observable 
state variable at time k given all observations up to time k. Not that the choice r = q is particular in that 
the weight update formula then reduces to oj z k+1 = w\ fffe+lC^fc+i) an< ^ *hus depends only on the previous 
weight and new particle position; when used in conjunction with resampling ideas to be discussed below 
this choice (r = q) is known as the bootstrap filter 

The method sketched so far corresponds to the sequential importance sampling algorithm, whose 
drawback is that it becomes unstable as k increase due to the discrepancy between the weights - a 
phenomenon sometimes referred to as weight degeneracy ^ Chapter 7]. To stabilize the algorithm 
it is necessary to perform resampling sufficiently often. In the following, we denote by k/}i<i<m 
the set of particle positions and associated weights at some generic time index k (which is omitted 
from our notations) and by Q n the cr-field generated by the generations of particles and weights up to 
time k, included. We also assume that the weights have already been normalized, i.e., that J^iLi w * = 
1. Resampling consists in selecting new particle positions and weights {£\w 8 },- =1 m sucn that the 
discrepancy between the resampled weights {u l } i=1 ^ is reduced. Of course, it is also necessary that 
the resampled particle system be as good an approximation to {f ,w ! }i<i< m as possible, in some suitable 
sense. There are a number of options for performing resampling and we focus here on the most widely 
used class of resampling techniques in which the resampling is random and subject to the constraints 



where n is a non-random integer and N 1 = 1 < j < n : £ J 1 = £*} are the particle duplication counts. 
The third constraint is sometimes known as the "unbiasedness" or "proper weighting" condition |15j . Of 
course, it is in general most natural to keep the population size fixed and n is often taken to be equal to 
m. In some situations however it does make sense to consider resampling scenarios in which n and m are 
different, at least for some time indices, and we thus keep separate notations for these two quantities. 

Note that we do not consider here some important resampling algorithms that are either such that 
the population size varies (randomly) after resampling 0] or such that the weights are not constrained 
to be equal after resampling |10|. Our aim with the present contribution is to complement the results 
previously published on resampling in ^3 03 EJ IS] as well as to discuss some conjectures. 

The rest of the paper is organized as follows: Section briefly describes the four main resampling 
methods that have been proposed in the literature which satisfy the constraints mentioned above. Sec- 
tion |3 shows that residual and stratified resampling, as well as the combination of both, improve over 
multinomial resampling in the sense of having lower conditional variance. We also provide a counter- 
example which shows that the same property does not hold for systematic resampling, although its 
empirical performance is generally found to be close to that of residual and stratified resampling. Finally, 
we consider in Section 0] the large sample (i.e., when n increases) behavior of particle filtering methods 
which use these various forms of resampling. We are currently able to show that, in general, central limit 
theorems hold with the residual resampling approach, although the target and proposal distributions 
must satisfy a non trivial condition. 

2 Description of Resampling Algorithms 
2.1 Multinomial Resampling 

The simplest approach to resampling is based on an idea at the core of the bootstrap method [Hj that 
consists in drawing, conditionally upon Q n , the new positions {C}i<i<n independently from the common 
point mass distribution Yl'jLi • ^ n practice, this is achieved by repeated uses of the inversion method: 



OJ. = 1/n, 

E [N l \ g n ] = nujl, for i = 1, . . . , m, 



M = n 



(1) 

(2) 
(3) 
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1. Draw n independent uniforms {U l }i<i< n on the interval (0, 1]; 

2. Set P = D" w (U l ) and = £ 7 , for i = l,...,n, where D™ is the inverse of the cumulative 
distribution function associated with the (normalized) weights {w l }i<i< m , that is, D™ v {u) = i for 
u e (S}=i ^ i Y2j=i U} ^\- When needed, we will denote by £ : {1, . . . , to} — > X the function such 
that £(£) = £*, so that £ 4 may also be written as £ o D" lv (U l ). 

This form of resampling is generally known as multinomial resampling since the duplication counts 
N , . . . , N m arc by definition distributed according to the multinomial distribution Mult(n; lo 1 , . . . , oj m ). 



2.2 Residual Resampling 

Residual resampling, or remainder resampling, is mentioned by |19j . |15j as an efficient means to decrease 
the variance due to resampling. In this approach, for i = 1, . . . , to, we have 

N l =[nuj' l \+N\ (4) 

where |_ J denotes the integer part and N 1 , . . . , N n are distributed according to the multinomial distribu- 
tion Mult(n - R; Co 1 , . . . , w") with R = £™ i l nujt \ and 

w* = 4r^< *=l..--,m. 5 

n — R 

This scheme obviously satisfy (j2J. In practice, the multinomial counts N 1 ,...,^" from the residual 
multinomial distribution are generated as in the multinomial resampling approach described above. 



2.3 Stratified Resampling 

Stratified resampling - see an d EE Section 5.3] - is based on ideas used in survey sampling and 
consists in pre-partitioning the (0, 1] interval into n disjoint sets, (0, 1] = (0, 1/n] U • • • U {{n — l}/n, 1]. 
The U l s are then drawn independently in each of these sub-intervals: XJ % ~ U (({i — 1} /n,i/n]), where 
U([a, b]) denotes the uniform distribution on the interval [a, b]. Then the inversion method is used as in 
multinomial resampling. It is easily checked that, as was the case for residual sampling, the difference 
between the duplication count N l and its target value mo 1 is less than one in absolute value (for all is). 
In addition. 



E 



i/n 



E 



E/°e°aj? v (tf*) 



'* fi/n " L 

= nJ2 fo^oDT(u)du = nJ2^.f(C), 

l=1 J(i-l)/n l=1 

for all integrable functions /, showing that this algorithm also satisfies JSJ. 



2.4 Systematic Resampling 

Systematic resampling takes the previous method one step further by dcterministically linking all the 
variables drawn in the sub-intervals. This is achieved by setting 

U< = (i-l)/n + U, 

where U is a single random draw from the U((0, 1/n]) distribution. Since the U l s generated this way 
obviously have the same marginal distribution as those used in the stratified resampling approach, the 
method still satisfies . It was introduced in the particle filter literature by [2] as "stratified" sampling 
but it is also mentioned by |19j under the name of universal sampling. It is often preferred due to its 
computational simplicity and good empirical performance. As pointed out by |14j however, it is the only 
resampling method for which the resulting particle positions £* are no more independent given Q n . Thus, 
studying its performance is much harder than for other methods. 
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A final remark of some importance is that both stratified and systematic resampling are sensitive to 
the order in which the particles are ordered: a simple permutation of the indices of the particles before 
resampling changes the distribution of the new resampled set of particles. In contrast, residual resampling 
behaves more like the basic multinomial resampling approach in that it disregards the order in which the 
particles are numbered. 



3 Basic Properties of Sampling Schemes 
3.1 Multinomial Resampling 

For multinomial resampling, the selection indices I 1 , . . . ,I n are conditionally i.i.d. given Q n and thus the 
conditional variance is given by 



Var 



1 ™ 

71 ^ 



gn 



i=l 



1 ™ 



(6) 



3.2 Residual Resampling 

The residual sampling estimator may be decomposed into 

[nuo' l \ 



n-R 



n L — ' L — ' n n £ — ' 

»=1 i=l i=l 



(7) 



where I 1 , . . . ,I n ~ R are conditionally independent given Q n with distribution P(P = j\G n ) = & for 
i = 1, ... ,n — R and j = 1, . . . , m. Because the residual resampling estimator is the sum of one term 
that, given Q n , is deterministic and one term that involves conditionally i.i.d. draws, the conditional 
variance of residual resampling is given by 



■ Var 



n-R 



E /(e r 



i=i 



Q" 



n - R 



Var 



Q" 



(8) 



1 7X1 

-E wi / 2 (^) 

n — > 



2=1 



E 



f 2 (C) - 



n-R 



E^/(r 



To compare (JEJ with first write 

E-W)=E^ 



n - i? 



E^/(f)- 



Then note that the sum of the m numbers \jllo' 1 \ jn plus (n — i?) /n equals one, whence this sequence 
of m + 1 numbers can be viewed as a probability distribution. Thus Jensen's inequality applied to the 
square of the right-hand side of the previous display yields 



^E 

i=i 



M /2(f) 



n - R 



5>W 



. i=i 



Combining with (JHl , this shows that the conditional variance of residual sampling is always smaller than 
that of multinomial sampling given by ©. 
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3.3 Stratified Resampling 

Because U 1 , . . . , U n are still conditionally independent given Q n for this method, 



By Jensen's inequality, 



Var 



1 " 

n ^ — ^ 



i=l 



n 

^^Var[/oeo^ v ([/ l )|5«] 

i=l 

-E^V^E »/ 



'(i-l)/n 



1 ™ 



i/n 
(t-l)/n 



-1 2 



> 



showing that the conditional variance of stratified sampling is always smaller than that of multinomial 
sampling. Note that stratified sampling may be coupled with the residual sampling method discussed 
previously: the proof above shows that using stratified sampling on the R residual indices that are indeed 
drawn randomly can then only decrease the conditional variance. It is also clear that the fact that the 
conditional variance is reduced does not depend on the particular choice of the sub-intervals (as being 
the intervals {{i — l}/n, i/n]), more general partitions could be considered as well. 



3.4 Systematic Resampling 

For this last sampling scheme, it is much more complicated to provide a usable expression of the condi- 
tional variance due to all the rcsampled particles being (conditionally) dependent |14j . We can however 
provide a simple counter-example to the frequently encountered conjecture that systematic resampling 
dominates multinomial resampling in terms of conditional variance. 

Consider the case where the initial population of particles {C}i<i<n is composed of the interleaved 
repetition of only two distinct values xq and x\, with identical multiplicities (assuming n to be even). In 
other words, 

{£*}l<i<n = {X0,Xi,X ,X!, . . . ,Xq,Xi}. 

We denote by 2ui/n the common value of the normalized weight to 1 associated to the n/2 particles £ l 
that satisfy = xi, so that the remaining ones (which arc such that £ l — xq) share a common weight of 
2(1 — Lo)jn. Without loss of generality, we assume that 1/2 < lo < 1 and denote by |/| = \ f(xi) — /(xo)|. 

Under multinomial resampling, © shows that the conditional variance of the estimate n _1 X)"=i /(£*) 
is given by 

= i(l-cH/| 2 . (9) 
n 

In this particular example, it is straightforward to verify that residual and stratified resampling are 
equivalent - which is not the case in general - and amount to deterministically setting n/2 particles to 
the value x\ (because the value 2io/n is assumed to be larger than 1/n), whereas the n/2 remaining 
ones are drawn by n/2 conditionally independent Bernoulli trials with probability of picking x\ equal 
to 2oj — 1. Hence the conditional variance, for both the residual and stratified schemes, is equal to 
n^ 1 (2uj — 1)(1 — w)|/| 2 . It is hence always smaller than ©, as expected from the general study of these 
two methods. Note that for specific configurations of the weights, such as when cu gets close to 0.5, the 
resampling becomes quasi-deterministic when using residual or stratified resampling and the improvement 
over the basic multinomial scheme becomes all the more significant. 

In contrast, systematic resampling also deterministically sets n/2 of the £ l to be equal to x± but 
depending on the draw of the initial shift, all the n/2 remaining particles are either set to x\, with 



n 

Var -E/(lmuit) 



n 

i=l 
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probability 2uj — 1, or to xq, with probability 2(1 — lo). Hence the variance is that of a single Bernoulli 
draw scaled by n/2, that is, 



Var 



1 n 

- ^ /(Isyst; 



0" 



( W -l/2)(l- W )|/| 5 



note that in this case, the conditional variance of systematic resampling is not only larger than for 
most values of to (except when oj is very close to 1/2), but it does not even decrease to zero as n grows! 
Clearly, this observation is dependent on the order in which the initial population of particles is presented. 
It is easy to verify (using simulations) that, in this example, systematic resampling becomes very similar 
to residual/stratified resampling if the particles are randomly permuted before resampling. Hence, the 
above counter-example probably correspond to a "rare" situation. It does however show that systematic 
resampling is a variance reduction method which is not as robust as systematic and residual resampling 
and also suggest that theoretical study of the behavior of systematic resampling probably is a very hard 
task. 



4 Large- Sample Behavior of Resampling 

We now come to the question of assessing the large sample behavior of particle filtering methods based on 
various forms of resampling. The behavior of basic particle filtering methods when using the multinomial 
resampling has been extensively studied in [5J . For reasons of space and simplicity we only consider here 
the case of the bootstrap filter (i.e., when the transition kernel q of the hidden chain is used as proposal) 
where resampling is performed at each time index. In this basic case, each iteration of the particle filtering 
algorithm may be decomposed into two successive steps: 

Prediction Given the population of unweighted particles at time index k, {£, k }i<i<m, extend each 
trajectory conditionally independently according to Ck+i ~ ■)> 

Filtering After computing the weights as 

in 

i=i 

perform resampling to obtain the new unweighted population of particles {££. +1 }i<i<ri- 

The choice of a particular resampling approach does obviously impact only on the second of these two 
steps. 

To establish central limit theorems for the algorithm above, one can use repeatedly the two theo- 
rems below which are adapted from ^ Chapter 9] where the corresponding results are stated under 
slightly more general assumptions. The current population of particle is assumed to satisfy the following 
assumptions. 



G 



Assumption 1. 



ft) {£*}i<»<m are consistent (in probability) and satisfy a central limit theorem (as m —> oo) for a 
density v and all bounded functions f, where cr 2 (f) denotes the asymptotic variance, that is, 



TO * — ' 

i=l 

-E/(^)-K/) 



K/) 



i=l 



N(0,a 2 (/)) 



/or bounded functions f. 



(ii) The weights are given by to 1 = <7(£ 1 )/ 52j=i 9(£ J )i "where g(x) = n(x)/v(x) for a probability density 
function fx; g is bounded from above and may be known up to a constant only. 



Theorem 2. Under Assumption^j \(y\ new particles {C+}i<i<m distributed conditionally independently 
under £5. ~ <?(£%■) are consistent for vq and all bounded functions f with asymptotic variance 



c-l(f)=v[qf 2 -(qf) 2 } +<J 2 (qf) 



(10) 



Theorem 3. Under Assumption if (a) the resampled particles are conditionally independent given 
Q n , (b) n — > oo with n/m — ► a, and, (c) 



n Var 



1 ™ 



(ii) 



that is deterministic, then {£ ? }i<i< n are consistent and satisfy a central limit theorem for fi and all 
bounded functions f with asymptotic variance 



a 2 (/)=K(/)+a<7 2 (J[/- M (/)]) 



(12) 



Following the argument of |14l ITTj . by repeatedly applying Theorems and El one may prove that the 
particle filter, when considered at any finite time index k, does satisfy a central limit theorem. The 
variance formula in (|10|) is a simple instance of the Rao-Blackwcll theorem whereas 1)120 shows that the 
limit of the conditional variance of resampling gets added to the variance of (self-normalized or Baycsian) 
importance sampling scaled by the factor a. This latter factor is interesting as it shows that using n <C to 
may render the variance of the particle estimator almost independent of what happened in previous steps. 
This phenomenon should not be over-interpreted however as it only occurs because the sum is normalized 
by n, and not to (or m + n) which is more connected with the actual number of operations required to 
implement the method. Note that the requirement that g be bounded, which is not very restrictive in 
the filtering context, may be relaxed - see Chapter 9] for details. 

With multinomial resampling, © and the consistency directly implies that «(/) = m(/ 2 ) — [m(/)] 2 
that is the variance under the target density fi. For other resampling schemes however, showing that (| 1 1 1) 
holds is all but trivial. We consider in the sequel the case of residual resampling. By JHJ), 



n Var 

m 

= £ 

i=l 
m 

= £ 



1 ™ 

n — ' 



i = l 



Q" 



(13) 



11 EJ E^/(f) 



I) 



. i=l 



E 




i-E 



\nuj l \ 
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Under Assumption Q f° r all bounded function /, 



E wI / 2 (f) 



i=i 



tin 



and Y^iLi^ /(£*) — ► £*(/)■ However the case of sums that involve integer parts cannot be handled 
similarly and require the following technical lemma. 

Lemma 4. Under Assumption QJ if n — » oo zot£/i n/m — » a and fl| x . a £( x ) g n}j = 0, then for all 
bounded function f , 



E 

i=l 







'} 




_ V - 





Proof. Recall that a/ = g(£,' 1 )/ Y^jLi 9(&) with = n{x)/v(x). For any A' > 1, define the set 

ml i I 

2^ — ^— /(Dl{Qg(^)e(K,oo)U((0,if)n8 f c)} 
i=l 

m 

^ E )l{ Q5 (e-)G(K,o O )u((0^)ni3K)} 

/(a;)l{a ff (x)fE(if,oo)u((o^)ni3 K )}M(a;)A((ix), 

where the notation 1 stands for the indicator function. The limit on the right-hand side of the last display 
can be made arbitrarily small by taking A sufficiently large because J f(x)l^ ag ^^yn(dx)X(dx) = and 
g is bounded by Assumption ^ For any K > 1, there exists r\ > such that 

x 1 {ai,(c*)e(o,jr)\B*-} (M ~ M^)J) =0. 
Combining the above with nj Y^JLi ~~ a anc l 

2^ ^ )H ag (^)€(0,K)\B K } 



[ag(x)\ 



a 



f(x)l {ag ^ x)&{ oj <) \B K }iy(x)X(dx), 



yields 



ml i I 

2^ — /(C) 1 {as(e*)e(o,iir)\BK} 



[ag(a;)J 



/(x)l {Q , s ( x ) 6 ( 0) i<:)\ Bjf }^(a;)A(da;). 



The proof follows by letting K — > oo. 

Corollary 5. Under Assumption^ and assuming that (i C^-l x - a^(a;)eN}J = ^' 

n 

1 E/« l 



□ 



n Var 



i=l 



«(/) 



{(hl?jM 



Q 



a // 
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for the residual sampling method. Hence, the resampled particles satisfy a central limit theorem with 
limiting variance given by (|12fl . 

The variance formula given in Corollary [3] was first derived in [3] which however lacked a rigorous 
proof of Lemma 0] and the necessity of the support condition - see [H] for a counter-example showing 
that this condition is indeed necessary and non-trivially satisfied. Note also that the asymptotic variance 
found in Corollary is obtained as the (rescaled) limit of the conditional variance and is thus smaller 
than in the case where multinomial resampling is used (see Section l3~^|) . 

5 Conclusions 

In practical applications of sequential Monte Carlo methods, residual, stratified, and systematic resam- 
pling are generally found to provide comparable results. Despite the lack of complete theoretical analysis 
of its behavior, systematic resampling is often preferred because it is the simplest method to implement. 
From a theoretical point of view however only the residual and stratified resampling methods (as well as 
the combination of both) may be shown to dominate the basic multinomial resampling approach, in the 
sense of having lower conditional variance for all configurations of the weights. A central limit theorem 
as been established for the residual sampling approach. It is likely that a similar result can be obtained 
for stratified sampling, based on Theorem The situation is however somewhat more involved in this 
latter case due to the fact that the new resampled particles, although still conditionally independent, 
have a distribution which depend on the order in which the particles are initially labelled. 
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